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1 .  Introduction 


Suppose  Ti,...,Tn  are  1.1. d.  nonnegative  random  variables 
(“lifetimes”)  with  common  distribution  function  (d.f.)  F(»)  and  suppose 
are  l.l.d.  nonnegative  random  variables  (“censoring  sequence”) 
with  common  d.f.  C(«).  Assume  also  that  the  lifetimes  and  censoring 
sequence  are  Independent.  In  the  setting  of  survival  analysis  data  with 
random  right  censorship,  one  observes  the  bivariate  sample 
(Xi,5i),...(Xn,6n),  where 

(1)  Xi-TiACi,  6t  -  lfT^Ct} 

with  a  denoting  minimum  and  1 { • }  denoting  the  Indicator  function  on  a 
set.  One  question  of  Interest  In  survival  analysis  Is  the  estimation  of 
the  hazard  rate  function  h(«),  defined  as  follows  when  It  Is  further 
assumed  that  F  has  a  density  f ( • ) : 


(2)  h(x)  l-log  F(x)  ]  -  f  (x)/F(x)  ,  F(x)  <  1, 

with  F  -  1-F.  (The  quantity  H(x)  -  -log  F(x)  Is  called  the  cumulative 
hazard  function.)  In  the  setting  without  censoring,  parametric  models  of 
monotone  failure  rate  have  been  extensively  studied  (see  Ch.  3  of  Barlow 
and  Proschan  (1975)).  The  nonparametric  estimation  of  h(x)  was  Initiated 
by  Watson  and  Leadbetter  (1964a,  1964b).  Subsequent  research  works  Include 
Barlow  and  van  Zwet  (1971),  Ahmad  (1976),  Rice  and  Rosenblatt  (1976),  Ahmad 
and  Lin  (1977)  and  Slngpurwalla  and  Wong  (1983).  There  are  essentially  3 
variants  based  on  the  delta-sequence  smoothing  introduced  by  Watson  and 


Leadbetter  (1964a,  1964b)  and  Rice  and  Rosenblatt  (1976)  (the  third 


variant) : 

By-  - - 

Distribution/ 
Availability  fode3 

[Avail  otiS/or 
DlSt  I  Special 


□  □ 


< 


I  *  t 


>“><*> 


h<2>(„) 


/  kn(x-u)dPn(u)/Pn(x),  Pn(x)  <  1; 

v«> 


'  E  k”(*-xa))  •  ir+t  ' 


-  /  k^x-u^H^x) 


■  E  *<*  i‘  +tr+ti- 


where  Fn  is  the  empirical  d.f.,  Hn  is  the  empirical  cumulative  hazard 
function,  X(j)  is  the  order  statistic  from  the  sample  {x^,  i*l,...,n}; 
and  {kn(«)}  is  a  delta-sequence  (see  Walter  and  Blum  (1979)),  which  in  the 
kernel  case  (see  Rosenblatt  (1956))  is  specialized  by  taking 

(6)  k”(v)  'irk 

where  k  is  usually  a  bounded,  symmetric,  density  function,  and  { b,, }  is  a 
so-called  band  sequence  such  that  b,,  +  0,  nb,,  +  »  as  n  +  »,  The  method 
of  analysis  in  the  uncensored  case  in  Rice  and  Rosenblatt  (1976)  parallels 
that  of  kernel  density  estimation  and  exploits  heavily  the  strong 
approximation  of  the  empirical  process  by  a  Brownian  bridge  (Komlos,  Major 
and  Tusnady  (1975)). 

When  the  data  are  subjected  to  random  right  censoring,  the  problem 
becomes  more  complex,  primarily  because  the  estimate  of  F(»),  due  to 
Kaplan  and  Meier  (1958),  now  takes  on  a  product  form: 


."V 


KMn(x)  = 


X(i)<x 


if  x  <  X(n); 


1  if  x  >  X(n)  and  the  largest  observation 

is  uncensored. 


Since  many  well-studied  properties  of  the  empirical  d.f.  cannot  be 

readily  transferred  to  the  Kaplan-Meier  estimator,  several  researchers 

circumvented  the  technical  difficulty  by  considering  an  equivalent  problem 

on  the  uncensored  observations  (for  example,  Blum  and  Susarla  (1980), 

Burke  (1983),  Yandell  (1983),  Liu  and  Van  Ryzin  (1985)).  Some  researchers 

(for  Instance  Ramlau-Hansen  (1983)),  employed  the  method  of  counting 

processes  studied  by  Aalen  (1978),  and  Gill  (1983).  Still  others  (*oldes, 

Rejto  and  Winter  (1981),  Burke  and  Horvath  (1984))  used  a  Chung-Smirnov 

type  result  on  the  Kaplan-Meier  estimator.  To  the  credit  of  Tanner  and 

Wong  (1983),  expressions  for  the  bias  and  variance  in  the  kernel  case 

(2) 

(essentially  the  form  tr  (x)  given  in  (4))  were  obtained  by  direct 
calculations  and  asymptotic  normality  was  proved  by  appealing  to  Hajek’s 
projection.  Tanner  (1983)  and  Liu  and  Van  Ryzin  (1985)  also  considered 
the  variable  kernel  case  along  the  line  of  the  nearest-neighbor  method 
(see  Mack  and  Rosenblatt  (1979)).  Padgett  and  McNichols  (1984)  gave  a 
review  of  density  and  failure  rate  estimators  from  censored  data. 

Our  present  research  is  motivated  by  a  recent  result  of  Lo  and  Singh 
(1984)  which  establishes  a  strong  uniform  approximation  of  the  Kaplan- 
Meier  estimator  by  an  average  of  i.l.d.  random  variables  with  a 
sufficiently  small  error.  This  allows  for  a  more  traditional  approach  to 
the  hazard  estimation  problem.  As  constrasted  with  approaches  mentioned 


in  the  paragraph  above,  our  method  will  be  a  direct  one.  Although  it  will 

become  apparent  that  we  could  equally  well  have  considered  the  variants 
(2)  (3) 

n  'x^»  or  n  'x^»  8*nce  there  have  been  fewer  investigations  carried  out 

for  h^^  with  censored  data,  the  estimator  we  use  will  be  of  the  form 
n 

given  by  h^^(x)  (see  (3))  with  Fn(x)  replaced  by  a  modified  version 
rn(x)  of  the  Kaplan-Meier  estimator  defined  as  follows  to  avoid  the 
possibility  that  KM^x)-  1: 


(8)  Tn(x)  = 


,  -  n 

v  .  ln-i+2;  * 

X(i)<x 


if  x  <  X(n); 


r^(X,  j)  if  x  >  X(n)  and  the  largest  observation 


is  uncensored. 


It  is  easily  checked  that  Fn(x)  >  for  all  x,  and  that 


a.s. 

(9)  sup  iKMnCx)  -Tn(x)|  -  0(n_1), 

0<x<T 

for  any  0<T  <  inf{t>0:  L(t)  -  1},  where  L(x)  -  F(x)*G(x)  -  P(Ti>x,  Ct>x). 
(Hereafter,  a.s.  will  be  an  abbreviation  for  "almost  surely.") 

In  Section  2,  we  state  the  preliminaries  needed  for  our  presentation. 
In  Section  3,  we  focus  our  attention  on  kernel  density  estimation  under 
censoring  via  strong  approximation.  In  Section  4,  we  give  the 
consistency,  asymptotic  normality  and  mean  squared  error  expression  of  our 
hazard  rate  estimate.  Finally,  in  the  last  section,  we  conclude  with 
relevant  comments  and  some  comparison  with  the  nearest  neighbor  method. 


6 


2.  Preliminaries 

We  will  concentrate  our  analysis  on  the  kernel  method.  We  assume 
throughout  our  discussion  that  L(x)  <  1  for  a  given  point  x  under 
consideration.  The  assumptions  we  made  on  the  kernel  k  are  as  follows: 
(kl)  k(x)  is  a  symmetric  density  function. 

(k2)  k(  x)  is  compactly  supported  with  support  [-c,c]. 

(k3)  k  is  continuous  on  its  support. 

(k4)  k  is  of  bounded  variations  with  total  variation  |k|. 

These  assumptions  are  the  "usual"  ones  encountered  in  the  kernel 

method  of  curve  estimation.  We  will  comment  on  the  use  of  kernels  with 

vanishing  moments  in  the  last  section.  The  estimate  that  we  consider 

are  modelled  after  h^^(x)  (we  continue  to  label  these  as  h^Vx)  for 

n  n 

convenience) : 

(10)  t/°(x)  S  /  -L-  k  (^)  d  rn(u)  /  Fn(x) 

=  fn(x)  /  rn(x) 

where  { t>n}  is  a  band  sequence  satisfying  initially 
(bl)  bn*0*  38  n+* 

To  analyze  the  asymptotic  behavior  of  h^^(x),  it  suffices  to 
analyze  that  of  fn(x).  As  mentioned  earlier  our  technique  is  motivated 
by  the  strong  representation  result  (Theorem  1)  of  Lo  and  Singh  (1984). 
In  Lemma  1  we  shall  show  a  modified  version  of  their  result.  We  begin 
with  some  notations.  Let  L^(t)  -  P(Xi<t,  6j*l).  For  positive  real 


7 


z  and  x,  and  6  taking  values  0  or  1,  let  C  (z,6,x)  *  -g(zAx)  +  (L(z)) 

y  _ 

I  (z<x  and  6*1),  where  g(y)  -  /  lL(s)]  dL^s)  and  I(»)  is  the  indicator 

o 

function.  Let  C^(x)  *  C(Xj,  6i,x).  Let  T  be  any  point  with  L(T)  <  1. 
Note  that  the  random  variables  Cf(x)  are  bounded,  uniformly  in  0<x<T, 
ECi(x)  •  0,  and  Cov(Ci(x),  C±C y> )  *  g(xAy)  (cf.  Lo  and  Singh  (1984)). 


Lemma  1 .  Assuming  that  F  is  continuous,  one  can  write 


(11)  log  T  (x)  -  log  F(x)  -  -  —  Z  Q.  (x)  +  R  (x),  where 
n  n  j  1  n 


(12)  P(  sup  |R  (x)|>a  )  -  0(n  ^), 


0<x<T 


for  any  p  >  0  with  a  ■  8»[log  n/n]  for  some  constant  9  >  0  depending 


Proof.  The  proof  is  given  in  the  Appendix. 


Remark:  Formula  (12)  is  replaced  by 

.  .  a.s.  _3/4  3/4 

(13)  sup  |R  (x) j  .  0(n  7  (log  n)  '  ) 

(Kx<T 

in  Theorem  1  of  Lo  and  Singh  (1984). 

It  follows  from  Borel-Cantelli  Lemma  that  (12)  implies  (13).  Hence 
Lemma  1  is  a  stronger  result  than  Theorem  1  of  Lo  and  Singh  (1984). 

Let  £i(x)  -7(x)»Ci(x). 


Lemma  2.  Assume  that  F  is  continuous,  then 


(14)  T  (x)  -  F(x)  Z  5<(x)  +  r  (x),  where 

n  n  j  l  n 


(15)  sup  Ejr  (x)|a  *  Of  -■-) for  any  a>l. 

0<x<T 

Proof.  We  shall  only  demonstrate  the  case  ct“l.  Since  Ci(x)'s  are 

uniformly  bounded  and  (n+1)  ^  <  r  (x)  *  1  for  x  fO,T],  we  have 

sup  |R  (x)|  *  0(log(nfl)),  and  hence 
0<x<T  n 

(16)  sup  E|RTl(x)  |  -  sup  E[|Rn(x)|  •  Kl^x))  >  an)  ] 

0<x<T  0<x<T 

+  sup  E[|Rn(x)|  •  l{jRtl(x)|  <  an>] 

0<x<T 

<  sup  PtlRnCx)]  >  an)  •  0(log(n+l))  +  an 
0<x<T 

-  0(an),  by  Lemma  1. 

Similarly,  one  can  show  that 

(17)  sup  E(R  (x)2)  -  0(a2). 

n  n 

0<x<T 

Now  by  Taylor's  expression, 

-  [rn(x)  -  F(x) ] 

-  rn(x)  -  f(x) 

-  exp{log  rn(x)}  -  exp{log  F(x)> 

-  [log  rn(x)  -  log  F(x)]  •  F(x)  +  An  •  [log  Tn(x)  -  log  F(x)]2 
=  -  -  E  (  (x)  +  F(x)  •  R  (x)  +  A  •  [log  Fn(x)  -  log  F(x)]2, 


where  An  is  between  rn(x)  and  F(x)  and  is  therefore  bounded  by  one 
It  now  follows  from  (16)  and  (17)  that 


Sup  E ] log  fn(x)  -  log  F(x)j2  -  sup  E[  ^  I  £,(x)  +  R  (x)]2 

0<x<T  0<x<T 


<  sup  2[E(  -  Z  C4(x))2  +  E(R  (x)2)] 
a  v  n  i  n 


<  sup  2n  *Var  (£  (x))  +  0(a2) 
0<x<T  n 


-  0(n-1)  +  0(a2). 

n 

Hence 

Sup  E|rn(x)|  <  sup  7(x)  •  E|Rn(x)|  +  sup  E|log  rn(x)  -  log  F(x)|2 
0<x<T  0<x<T  0<x<T 

-  0(an)  +  0(n_1)  +  0(a2) 


■  0(an) .  • 

Finally,  we  state  a  lemma  which  by  now  is  a  standard  device  in  the 
kernel  estimation  literature: 


Lemma  3:  Assume  the  kernel  k  is  a  bounded  density.  Let  g  be  an 
lntegrable  function. 

(a)  If  {bjj}  is  a  sequence  of  positive  numbers  such  that  b^O  as  n+“> 

then 

(18)  lira  /  —  k(^— g(u)  du  -  g(x) 

n+«  n  n 


for  every  continuity  point  x  of  g 


where 


(21)  0n(x)  -  /  £(*  -  vbn)k(v)  dv  -  f(x) 

-c 

Is  essentially  the  bias  of  fn(x); 

i  n  c 

(22)  Ofl(x)  “  nb“  2  1  5(X1,61,x  -  vb  )  dk(v) 

n  1  -c 

Is  the  random  fluctuation  component  of  fn(x),  (we  note  that  the  Integral 
Is  well-defined  for  n  large  enough  because  k  Is  compactly  supported),  and 

(23)  en(x)  «T-/  rjx  -  vbj  dk(v) 

n  -c 

Is  the  error  of  the  approximation.  It  Is  easily  checked  that 

(24)  sup  |e  (x)|  ■  0( ( log  n/n)3^4  •  ■£-) 

0<x<T  n  n 

by  Lemma  2  and  the  fact  that  k  Is  of  bounded  variation  (condition  (k4).) 

The  process 

-  l  n 

(25)  £(t)  -  ±  Z  C(X1,61,t),  0<t<T, 

/n  1 

has  mean  zero  and  covariance 


_  _  sAt 

(26)  r(s,t)  =  E[C(s)  5<t)]  -  F(s)  F(t)  /  [L(u)]'2  d  L^u), 

0 

where  we  recall  L(t)  •  F(t)  •  G(t),  and  Lj(t)  ■  P(Xf<t,  6j*l).  One  notes 
that  this  agrees  with  the  covariance  of  the  Kaplan-Meier  process  obtained 
by  Breslow  and  Crowley  (1974)  and  reduces  to  the  usual  covariance  of  the 
empirical  process  In  the  absence  of  censoring  (see  Hall  and  Wellner 
(1980)).  on(t)  Is  thus  a  process  with  mean  zero  and  covariance 


i 


f 


(27) 


El>n(t>  °n<8>] 


c  c 

“  ~~~2  f  f  Y(t-ubn, 


nb  -c  -c 
n 


We  now  summarize  our  findings  in  the  following 


s-vb  )  dk(u)  dk(v). 
n 


i 

i 


Proposition  1:  Suppose  F  is  absolutely  continuous  with  density  f(x)  >  0 
at  x.  Suppose  k  is  of  bounded  variation  and  is  continuous.  Then  fn(x) 
admits  the  strong  approximation  on  the  Interval  [0,T]: 

(28)  fn(x)  -  f(x)  +  8n(x)  +  <rn(x)  +  en(x), 

where  pn(x),  an(x),  en(x)  are  defined  in  (21),  (22)  and  (23)  respectively, 
and  en  satisfies  (24). 

In  view  of  previous  lemmas  and  the  above  proposition,  we  have  the 
following  consequences: 

Corollary  1:  (Strong  pointwlse  consistency.)  Suppose  k  satisfies 
(kl)  -  (k4),  {bn}  satisfies  (bl),  and  additionally, 

1/2 

(b2)  (n/log  log  n)  •  b„  -*■  ®  as  n  +  •; 

f(x)  exists  and  is  continuous  at  x.  Then  fn(x)  -*•  f(x)  a.s. 

as  n-*». 


Corollary  2:  (Bias  and  variance.)  Suppose  k  satisfies  (kl)  -  (k4), 
f(x)  >  0,  and  that  f  is  twice  continuously  differentiable  at  x,  then 

(29)  E  f  (x)  -  f ( x)  +-^4^  /C  yZ  k<v>  dv  •  b2  +  ofb2)  +  0(b_1a  ) . 

n  z  n  n  n  n 


Q  I 

(30)  Var  f  (x)  ■  (nb  )  1  -  /  k2(v)dv  +  0(n  ^ )  +  0((a  /b  ))2  I 

n  n  «/  \  n  n  i 

G(x)  -c 

i 

t 
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Proof:  We  shall  demonstrate  (30)  only.  Consider  first 

c  c 

Var  a  (x)  ■  (nb^)  *  /  /  F(x-ub  )F(x-vb  ). 

n  n  '  n  n 

-c  -c 

gfCx-ubj,}*  (x-vbn)]dk(u)dk(v), 
ry  ~  -2 

where  g(y)  *  J  [L(t)]  dLj(t)  has  Lebesque  derivative 
o 

J  _  /  ,  \  dL,  (t)  _  «  ry 

(3D  - /tL<t)  1  -  f (t)/[G(t)*F(t J  ]• 

Since  k  is  symmetric,  a  two-term  Taylor  expansion  argument  yields 

(32)  Var  o  (x)  -  a*(x)  +  0(n-1), 
n  n 


where 


w ^  c 

O*  - - 2~  /  /  g[(x-ubn)A(x-vbn)]dk(u)dk(v). 

nb  -c  -c  ' 

n 


Using  integration  by  parts,  we  have 


x-vb 


/  g[(x-ubn)A(x-vbn)]dk(u)  -  /  k(^~)dg(w) 


-c 


x-cb  n 

n 


Thus  by  Fubini  Theorem  and  a  change  of  variable,  we  otbain 

.  5/  \2  x+cb  - 

Var  „*(,)  - /  ”  k2(^i)dg(w) 

nb  x-cb  n 

n  n 


.f(x)i  JX+Cbn  k2(«gl).  -1W—  dM, 

nb  x-cb  n  G(w)F(w) 

n  n 
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where  the  last  equality  follows  by  (31).  Finally,  another  change  of 

variable  and  expansions  applied  to  — - —  lead  to  the  following 

G  F 

approximation 

£ 

(33)  Var  o*(x)  /  k2(v)dv  •  +  CKn^1). 

G(x)  -c  n 

Next,  observe  that  by  Lemma  2,  for  n  large  enough,  since  b,,  +  0  as 
n  «,  we  have 

(34)  Var  en(x)  <  E(e2(x)) 

c  c 

-  — ~  /  /  E[r  (x-ub  )r  (x-vb  )]dk(u)dk(v) 

.2.  n  nn  n 

b  -c  -c 
n 

-  °«w2)- 

Thus  (30)  follows  by  applying  (32),  (33),  (34)  and  Schwartz's  Inequality 
to  an  expansion  of  Var  fn(x)  via  (28).  a 

Corollary  3:  (Asymptotic  normality.)  Suppose  k  satisfies  (kl)  -(k4), 
bn  ■  o(n  *  ^),  and 


(log  n) 


Vi  *  bn  *  *  as  n  -  ». 


/Ub"  [f  (x)  -  f(x)j  — >  N(0,  /  k2(v)dv) 

n  "  G(x)  -c 

as  n-*°.  Here  ~->  means  convergence  in  distribution. 


Remark:  Putting  bn  “  0(n"a),  the  conditions  in  Corollary  3  say  1/5  <  a  <  1/2, 


I, 


4.  Kernel  estimation  of  the  hazard  rate 


He  begin  by  stating  the  strong  consistency  of  h^(x): 

n 


Theorem  1.  Let  k  satisfy  (kl)  -  (k4),  { b^}  satisfies  (bl)(b2). 


(a)  If  f  Is  continuous,  then  h'  '(x)  -*■  h(x)  a.s.  as  n  ♦  ». 

n 


(b)  If  f  Is  uniformly  continuous,  then  for  any  T  with  L(T)  <  1, 


h''  ;(x)  -*•  h(x)  uniformly  a.s.  on  to,T]  as  n  ♦ 
n 


Proof .  Since  '  estimates  F(x)  uniformly  a.s.,  the  polntwlse  result  (a) 


Is  a  direct  consequence  of  Corollary  1.  For  (b),  the  proof  Is  also 


•  •  ••  ^ 

standard,  noting  that  by  Csorgo  and  Horvath  (1983), 


,a.s.  -1/2 

sup  | KM  (x)  -  F(x)j  -  0(n  log  log  n). 


0<x<T 


In  order  to  establish  the  asymptotic  normality  of  lv  /(x),  write 


^  [t/^(x)  -  h(x)]  -  /nb~  {f  (x)  [3-^ - — ] 

n  L  n  J  n  n  L=  \ 


r  (x)  F(x) 
n 


+  [fn(x)  “  f(x)]  /  F(x)} . 


It  suffices  to  show  the  first  term  on  the  right  converges  to  zero  In 


probability.  Now 


F(X>  ~  r  (X)  _  r  -I 

(35)  /nb"  {f  (x)  [ - 2 - ]}  .  /n  lF(x)  -  T  (x)J« 

n  n  Fn(x)  F(x)  n 


•  /b~  f  (x)[F  (x)  F(x)]“  . 

n  n  n 


W 
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Since  /n[rn(x)  -  F(x)]  tends  in  distribution  to  a  normal  random  variable, 
fn(x)  converges  to  f(x)  a.s.  by  Corollary  1,  and  clearly  frn(x)  F(x)]“l 
converges  to  [F(x)]“2  a.s.,  we  have  by  Slutsky's  Theorem  that  the 
expression  in  (35)  tends  to  zero  in  probability.  To  summarize,  we  have 

Theorem  2;  Suppose  F  is  absolutely  continuous  with  density  f(x)  >  0, 
suppose  the  kernel  k  satisfies  (kl)-(k4),  and  suppose  the  band  sequence 
{bn}  satisfies  (b3)  and  additionally  that  ^  ■  o(n"^^).  Then  we  have 

(36)  /nb~  [h(1)(x)  -  h(x)]  - >  N  (0,  /  k2(v)dv) 

n  n  L(x)  -c 

as  n  +  «. 

Remark  1. 

Tanner  and  Wong  (1983)  tackled  the  asymptotic  normality  question  by 

(2) 

Hajek's  projection.  Their  centering  constant  is  E  h^  (x),  thus  bypassing 
the  bias  issue.  They  also  Imposed  a  compatibility  condition  on  the  kernel 
k  with  respect  to  both  F  and  G.  Such  a  condition  is  met  by  kernels 
satisfying  (kl)-(k4). 

We  now  turn  our  attention  to  the  study  of  MSE  of  h^^(x).  Write 

n 

(37)  E[h^°(x)  -  h(x)]2  -  E[l  +  II  +  III]2, 
where 

i  -  f  (X)  [_J_  -  -2—]  , 

r  (x)  f(x) 
n 

II  -  [fn(x)  -  E  fn(x)]  /  F(x), 


III  -  [E  fn(x)  -  f ( x) ]  /  F(x). 


We  will  show  that  the  main  contribution  comes  from  EdI2)  and  E(III2),  all 
other  terms  in  the  quadratic  expansion  being  of  smaller  order*  Note  also 
that  III  is  deterministic.  Now 

(38)  E(II2)  -  [P(x)]-2  •  Var  fn(x) 

.  jC  k2(v)dv  .  _1_  +  0(n-l)  +  0[(^)]2  +  0(^2.  .(nbn)_1 

L(x)  -c  n  n  n 


3  a 

(39)  E(III2)  -  [F(x)f2  p2(x)  +  0([  -^2.  ]2)  +  3n0(^-) 


[  1  -(X'~  /  v2k(v)dv]2  •  b*  +  o(b*)  +  0([  r-2.  ]  )  +  0(anbn). 
2F(x)  -c  n 


To  evaluate  E(I),  let  us  first  consider 

f  (x)  f  (x)  F  (x)  -  F(x)  -1 

(40)  E[  -2—]  -  E{  [i  +  _£ -  ]  } 

f  (x)  F(x)  F(x) 

n 


f  (X)  f  (x)  r  (x)  -  f(x)  , 

B  t^— ]  -  *(  - ]  •  (1  +  el’2  ) 

F(x)  F(x)  F(x) 


for  some  en  between  0  and  (rn(x)  -  7(x)]/F(x)  by  Taylor's  expansion  for 
large  enough  n  since  Fn(x)  >  for  all  x.  Since  ffn(x)fleo  <  M  for  some 
o<M<®  by  (k2)  and  (k3),  we  have  that 
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(4i)  e 1 1 J  -  i\zr—2  [' 

n/  _  X 


f  (x)  f_(x)  -  F(x) 


F(x)  (1  +  €n)' 


]  I  <  •=*-?  *  E| 


r  (x)  -  f(x) 
n  I 


<1+en> 


<  — (e[F  (x)  -  F(x)]2! X/-  {E[l  +  e  1"A}1/2 

F(x)Z  n 


by  Schwarz  Inequality.  Now 


0  <  Efl  +  enj“4  <  E[  ]4  +  1 


r  (x) 
n 


F  (x)  -  F(x) 

E  [  1 - “T -  ]  +1, 

rn(x) 


rn(x)  -  F(x)  4  dn  4 

E  [-T^o  ]  ‘  [f(.)  -  a 1  •  -  f(*>i '  d„) 

n  n 


+  [— §—  f  •  p(|rn(x)  -  F(x)|  >  dn) 


<  o[d4  +  n4  •  n 


0(n-1), 


where  the  last  inequality  follows  from  Lemma  2  and  the  exponential  bound 


in  Lemma  1  of  Lo  and  Singh  (1984)  and  dn  ■  x  •  (log  n/n)  '  for  some  t>0. 


Hence  from  Holder's  inequality,  we  have 


E(1  +  en]-4  -  0(1). 


Vv‘> .V. '.'is' 


...  * 


Apply  Lemma  2  once  more,  one  can  show  that 


(45)  E[fn(x)  -  F(x) ] 2  -  OCn"1). 

It  now  follows  from  (41),  (44)  and  (45)  that  E|l|  -  0(n-1^2).  The  term 

2  —1 
E(I  )  can  be  shown  In  a  similar  fashion  to  be  of  order  0(n  ).  Hence 

from  (29)  and  (38), 

(46)  E | I* III |  -  | III | • E | I |  -  0(n"1/2b2)  +  0(n_1/2(an/bn)) 

(47)  E 1 I« II |  -  0(n"1/2*  (nbn)”1/2)  +  0(n“1/2  •  Un/bn)) 

+  0(n-1/2  •  (an/bn)1/2«(nbn)"1/4). 

Let  b|j  be  of  the  form  cn"*p,  where  c,p  are  both  positive  constants. 
For  0<p<l/4,  0(n  *^2b2)  and  0(n  1^2(nbn)  ^2)  are  the  dominating 
terms  in  E | I« III |  and  E|l»Il|  respectively. 

For  1/4  <  p  <  1/2,  0(n“1/2(an/bn))  and  0(n"1/2(an/bn)1/2(nbn)_1/4) 

are  the  dominating  terms  in  E  j I* III |  and  E|l»Il|  respectively. 

“1/2 

For  p  >  1/2,  o(n  (an/bn))  Is  the  dominating  term  in  both  E | I* III |  and 
E 1 1  *  1 1 1 . 

4  -1/2  2  -1  -1/2  2 

Since  b  dominates  n  b  for  p  <  1/4,  and  (nbn)  dominates  n  b 
n  n  “  n 

“1/22  A 

for  p  >  1/6,  the  term  0(n  ^b  )  is  always  dominated  by  either  b  or 

n  n 

(nb,,)  for  any  p  >  0. 
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Also,  (nbn)  1  always  dominate  n  ^(nbjj)  *^2,  n  ^(a,,/!^)  and 

n  (an/bn)  (nb„)  for  any  p  >  0. 

E(II2)  and  E(III2)  will  be  the  main  contribution  to  the  MSE  of 

h^^(x).  Futhermore,  if  satisfies  (bl)  and  (b2),  (nbjj)”1^2 

4  —1 

will  dominate  (an/bn).  Also  either  bn  or  (nbn>  will  dominate  anbn. 
We  now  state  our  finding: 


Theorem  3:  Suppose  f  is  twice  continuously  differentiable  at  x,  f(x)  >  0, 
the  kernel  k  satisfies  (kl)-(k4),  and  the  band  sequence  {bj,}  satisfies  (bl) 
and  ( b2 ) .  Then 

(48)  MSE[h^^(x)]  *  [—  j  v2k(v)dv]2  •  b^  +  /  k2(v)dv]  •  — j— 

n  2F(x)  -c  n  L(x)  -c  n  n 


+  o(  b^  +  — r— )  * 

*■  n  nb  ’ 


Concluding  Comments 

(a)  We  have  seen  in  the  above  discussion  the  use  of  Lo  and  Singh's 

(1984)  strong  represnetation  of  the  Kaplan-Meier  estimator  in 

analyzing  kernel  estimation  of  hazard  rate  functions.  We  have 

chosen  to  consider  the  estimates  given  by  h^^(x)  as  contrasted 
(2) 

with  hR  (x)  studied  by  Yandell  (1983).  Our  variance  expression 
and  asymptotic  normality  results  are  similar  to  theirs,  although 
we  have  employed  a  more  traditional  approach.  The  bias  for  the 


KTnO 


S  ■  •  ‘  •  '  .  *  .  ‘  *  k  *  l'*  to"*  »**  ***  L*S 
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three  variants  appear  to  be  different  in  the  scale  constant  but 
not  thn  rate. 

(b)  Tanner  (1983)  mentioned  that  a  nearest-neighbor  approach  may  be 
preferable  to  the  fixed  band  sequence  approach  from  an  extensive 
simulation  experiment.  This  observation  appears  to  have  some 
theoretical  support  judging  from  the  recent  work  of  Liu  and 

Van  Ryzin  (1985)  which  essentially  used  an  asymmetric  nearest- 
neighbor  window.  Both  their  findings  (Theorems  4.3  and  4.4)  and 
the  findings  of  some  other  researchers  on  nearest  neighbor 
density  estimation  with  censored  data  (for  instance  Mielniczuk 
(1984))  suggest  that  the  censoring  mechanism  may  have  no  effect 
on  the  variance  for  nearest-neighbor  estimates.  This  may  be  an 
advantage  in  terms  of  constructing  a  confidence  interval  at  a 
fixed  point  or  a  simultaneous  confidence  band  if  one  wants  to 
test  for  goodness-of-f it.  Nevertheless,  one  cautions  that  the 
bias  behavior  of  the  Liu  and  Van  Ryzin  variable  histogram 
estimator  suffers  essentially  the  same  drawback  as  nearest- 
neighbor  estimators  in  that  it  may  be  quite  large  at  the  tall 
regions  of  F. 

(c)  A  number  of  researchers  in  kernel  estimation  have  studied  the 
effects  of  kernels  which  may  have  vanishing  moments.  Its  use. 
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coupled  with  the  assumption  of  a  higher  degree  of  smoothness  of 
h(x),  can  make  the  convergence  of  the  bias  to  zero  faster.  This 
point  of  view  was  taken  in  Slngpurwalla  and  Wong  (1983).  Of 
course  one  pays  the  price  that  the  estimator  so  constructed  may 
take  on  negative  values  if  the  sample  size  is  not  "large  enough." 
For  this  reason  we  have  kept  the  non-negativity  of  the  kernel  in 
this  paper. 


Lemma  A.  For  any  a>0,  and  0<b<l,  we  have 

P{  sup  |R  ,(x)|  >  an_b}  <  2e"n(1"q)/36 
0<x<T 

for  large  n. 

Proof. 

iw-'l  <  s*Uo*(^)  +  » ’  fTTT1, 

where  £*  sums  over  all  i  such  that  X(jj<t  and  «* 


1 
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?( 


3K  >  an”b)  -  P(K  > 


an 


2-b 


n(n-K) 


3  +  an 


1-b 


<  P(K  >  (l+q)n/2)),  for  large  n 


-  P(K  -  nq  >  ( l-q)n/2) 

<  Ze^1^'36. 


where  the  last  Inequality  follows  from  Lemma  1  of  Lo  and  Singh  (1984)  by 
letting  T| j[  *  I(X£  <  T,  ■  l)-q,  c»l,  d*n(l-q)/6,  a2  «*  q(l-q)  and  z  *  d/6 
-  n(l-q)/36 

Lemma  4  now  follows  immediately*  ■ 

Lemma  5,  For  any  e  >  0,  p{  sup  |R  «(x)j>e{  <  a’e"*01*  £  for  some  positive 

o<x<  T 

constants  a*  and  b*  * 

Proof.  Note  that  0  <  R^Cx)  <  Rn2^T^*  We  ^ave 

|Rn2(T)|  <  eQ2  [Ln(T)]  ^ILjj-LI2,  where  l*lT  is  the  sup-norm  of  a 
function  over  the  interval  [0,T]. 

Hence,  P(|Rn2CT)|  >  e) 

<  P{[Ln(T)]“1«lli^-LII2  >  e  e2  ,  ^(T)  >  e0/2}  +  P{i^(T)  <  e0/2} 

<  P{li^-L«2  >  e  e3/2}  +  Pf^T)  <  e0/2} 


-  I  +  II. 


Lemma  2  of  Dvoretsky,  Kiefer  and  Wolfowltz  (1956)  implies  that 

3 

1  <  constant  •  e-nee°,  and 
II  -  Pfl^T)  -  e0  <  -  tQf 2} 

— nSft/2 

<  constant  •  e  °  » 

Lemma  5  is  thus  proved.  H 

Lemma  6.  If  F  is  continuous,  for  any  p  >  0,  there  exists  constant  rj  >  0 
such  that 

P(  Sup  jRn3(x)j  >  h  *  (log  n/n)3^4)  -  0(n  ^). 

0<x<T 

Proof .  We  shall  give  the  proof  for  the  case  when  G  is  also  assumed  to  be 
continuous,  and  hence  L  is  continuous. 

The  proof  parallels  those  of  Lemma  2  of  Lo  and  Singh  (19B4) 
with  more  rigorous  probability  statements.  The  proof  when  G  is  arbitrary 
can  be  done  similarly  as  the  remark  on  page  10  of  Lo  and  Singh  (1984). 

We  shall  now  proceed  with  the  proof  when  both  F  and  G  are  assumed  to  be 
continuous. 

Divide  the  interval  [0,Tj  into  subintervals  [xj,  Xf+il,  i-O,...,!^, 
1/2 

where  kn  -  o(n/log  n)  )  and  0  ■  xq  <  xj  <  ...  <  x^+1  "  T  are  ®“ch  that 

1/2 

L(xi+i)  -  L(x^)  <  ci  •  (log  n/n)  .  This  is  possible  because  L  is 
assumed  to  be  continuous.  From  any  0<x<T,  we  have 

|Rn3(x)|  <  k,i  •  Sup  J I LIt( t) ]  1  -  [L(t)]_1J*  Max  |(Mn"MKxi+l)  ~ 
0<t<T  KKkj, 

(Lin-Li)(xi>|  +  2  Max  Sup  |tEn(t)]“1  -  [^(xi)]"1 
0<l<kn  xi<t<xn-i 

-  [L(t)]"1  +  lUxi)]"1! 
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from  the  proof  of  Lemma  2  of  Lo  and  Singh  (1984)* 

To  estimate  B,  we  further  subdivide  each  [x^,  xi+j ]  into  subintervals 

3/4 

[*lj,  *i(j+i)].  !“!»••• »®n  8uch  that  Ux^j+i))  "  L(xij>  <  c2  *  (1°8  n/n)  , 

1/4 

for  all  1,  j  and  -  0((n/log  n)  ,  ).  Consider 

-  Sup  llLn(t)]"1  -  IVxi)]-1  -  [L(t)]"1  +  lUxi)]"1! 
xi<t<x1+1 

<  Sup  |[Ln<t)  “  L(t)J[L<t)f2  -  [Ln(xi)  -  L(Xl)  ]  [L(xi>  f2 1 

xi<t<xi+i 

+  2ILn  -  LHT  •  I (Ln  L)-1  -  (L)"2»t 

<  Sup  |L(xi)I  2|Ln(t)  -  L(t)  -  Ln(xi)  +  L(xj[)| 
xi<t<x1+1 

+  Sup  |Ln(t)  -  L(t) | • |L“2(t)  ~  L“2(xi)|  +  211^  -  LI 2  •  e”2  [i^(T)]" 
xi<t<x1+1 

<  Max  e  2  |Ln(xij)  -  Ln(*ij)  “  ^(x*)  +  Uxi)|  +  «2  (lo8  n/«)3^4 
Kj<“n 

+  2e~4  •  ILn  -  LIT  •  cj ( log  n/n) 1/2  +  2e”2»Ln  -  LI2  [Ln(T) l”1. 

We  have 

p{Aj  >  tc2  +  9eo2(ciP)1^2]  •  (log  n/n/)3^4} 

<  P{  Max  |Ln(xij)  -  L(x^j)  -  L^x*)  +  E(*i)|  >  3(cip)^2(log  n/n)3^4} 
Kj<«n 

+  P{ ILjj  -  Llx  >  (3/2)  e2  (p/ci)1/2  •  (log  n/n)1/4 
+  PflLn  -  LI2[Ln(T)]-1  >  (3/2)(ciP)1/2(log  n/n)3/4} 


-  I  +  II  +  III. 


From  the  proof  of  Lemma  5,  III  <  a’e 


for  some  positive  constants  a' 


-b'n 


1/4 


* 

I 

and  b'.  From  Lemma  2  of  Dvoretzky,  Kiefer  and  Wolfowltz  (1956),  II  < 

-b*n1/2 

a*  •  e  ,  for  some  positive  constants  a*  and  b*»  As  for  I,  for  any 

fixed  j,  use  Lemma  1  of  Lo  and  Singh  (1984)  with  -  L^x^j)  -  L(x*j)  - 

L„(xi)  +  L(xi),  c-1,  o2  <  ci  •  (log  n/n)1^2,  z  -  p  log  n,  d  -  (ciP)1^1^4 

3/4  j 

(log  n)  .  We  have  cz  <  d  for  large  n,  and  nzo2  *  d2,  ! 

( 

Bonferoni  inequality  then  implies  I  <  2mn  e  ^  n  ■  2mn  n-^. 

So  far  we  have  shown  that,  for  any  positive  p  there  exists  a  positive  j 

I 

constant  u  such  that  ! 

p{Ai  >  u  •  (log  n/n)3  4)  <  2mn  n-^  +  a'e  n  +  a*e  n  ,  j 


*  2mn  n  ^  +  0(n  ^). 

Applying  Bonferoni  inequality  once  more,  we  have 

P{p  >  u  •  (log  n/n)3^4)  <  (kn  +  l)mn  •  0(n”^),  for  P  >  1, 

■  0(n  ^),  for  p  >  0, 


To  estimate  A,  use  the  fact  that  |Lj(x)  -  Lj(y)|<|L(x)  -  L(y)|  for  any 
x  and  y.  Apply  Lemma  1  of  Lo  and  Singh  (1984)  again  as  we  did  for  the 
term  I  above,  and  we  have 

P(Q1)  -  P{  Max  |(Lin  -  L)(x1+1)  -  (Ljn  “  Li)(xi)|  constant  •  (log  n/n)3^4} 
Kttkn 


\ 


j 


1 

1 


* 
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Hence 

3/4 

P(A  >  constant  •  (log  n/n)  '  ) 

■  Pfkf,  •  I <£„)  *  -  (L)-1|T  •  >  constant  •  (log  n/n)3/^} 

<  P{kn  •  ILn  -  Ll-j*  •  fLn(T)]  Qj  >  constant  •  (log  n/n)3/^} 

<  P{[Ln(T)]  fij  >  constant  •  (log  n/n)3^}  + 

P{kn  •  »(Ln)  -  (L)»t  >  p1/2} 

-  pfQj  >  constant  •  (log  n/n)3^4}  +  Pfl^Cr)  <  (c0/2) } 

+  P{kj,  •  ILn  -  LIIT  >  p1/2} 

2 

*  lc,,  •  0(n  +  constant  •  e_ne°^2  +  constant  •  e  2^  2og  n 

for  arbitrary  p  >  1,  where  the  second  term  was  computed  in  Lemma  5  and  the 
third  term  comes  from  Lemma  2  of  Dvoretzky,  Kiefer  and  Wolfowitz  (1956). 

"  0(n  ° )  for  arbitrary  p  >  0. 


We  have  thus  shown  Lemma  6 
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